Generation of Attribute Value Taxonomies from Data and Their Use in Data-Driven Construction of Accurate and Compact Naive Bayes Classifiers

نویسندگان

  • Dae-Ki Kang
  • Adrian Silvescu
  • Jun Zhang
  • Vasant Honavar
چکیده

Attribute Value Taxonomies (AVT) have been shown to be useful in constructing compact and robust classifiers. However, in many application domains, human-designed AVTs are unavailable. For this problem, we introduce AVT-Learner, an algorithm for automated construction of attribute value taxonomies from data. AVT-Learner uses Hierarchical Agglomerative Clustering (HAC) to cluster attribute values based on the distribution of classes that cooccur with the values. We describe experiments of AVT-Learner on several benchmark data sets that compare the performance of AVT-NBL (an AVT-guided Naive Bayes Learner) with that of the standard Naive Bayes Learner (NBL) applied to the original data set as well as a data set generated by augmenting the original data set with a set of additional attributes corresponding to the nodes in the AVTs. Our results show that the AVTs generated by AVT-Learner are competitive with human-generated AVTs (in cases where such AVTs are available). AVT-NBL using AVTs generated by AVT-Learner achieves classification accuracies that are comparable to or higher than that obtained by NBL; and the resulting classifiers are significantly more compact than those generated by NBL.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Naive Bayes Classifiers From Attribute Value Taxonomies and Partially Specified Data

Partially specified data are commonplace in many practical applications of machine learning where different instances are described at different levels of precision relative to an attribute value taxonomy (AVT). This paper describes AVT-NBL – a variant of the Naïve Bayes Learning algorithm that effectively exploits user-supplied attribute value taxonomies to construct compact and accurate Naïve...

متن کامل

Learning Naı̈ve Bayes Classifiers From Attribute Value Taxonomies and Partially Specified Data

Partially specified data are commonplace in many practical applications of machine learning where different instances are described at different levels of precision relative to an attribute value taxonomy (AVT). This paper describes AVTNBL an extension of the Naı̈ve Bayes Learning algorithm that effectively exploits user-supplied attribute value taxonomies to construct compact and accurate Naı̈ve...

متن کامل

Multinomial Event Model Based Abstraction for Sequence and Text Classification

In many machine learning applications that deal with sequences, there is a need for learning algorithms that can effectively utilize the hierarchical grouping of words. We introduce Word Taxonomy guided Naive Bayes Learner for the Multinomial Event Model (WTNBL-MN) that exploits word taxonomy to generate compact classifiers, and Word Taxonomy Learner (WTL) for automated construction of word tax...

متن کامل

Diagnosis of Pulmonary Tuberculosis Using Artificial Intelligence (Naive Bayes Algorithm)

Background and Aim: Despite the implementation of effective preventive and therapeutic programs, no significant success has been achieved in the reduction of tuberculosis. One of the reasons is the delay in diagnosis. Therefore, the creation of a diagnostic aid system can help to diagnose early Tuberculosis. The purpose of this research was to evaluate the role of the Naive Bayes algorithm as a...

متن کامل

The construction and exploration of attribute-value taxonomies in data mining

With the widespread computerization in science, business, and government, the efficient and effective discovery of interesting information and knowledge from large databases becomes essential. Knowledge Discovery in Databases (KDD) or Data Mining plays a key role in data analysis and has been found to be beneficial in many fields. Much previous research and many applications have focused on the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004